Chroma from Luma (CfL)

Outline

  • Intro
    • Overall description
    • Other CfL Implementation
    • How this is different
  • Fundamentals
    • Model fitting
    • Simplifications
  • Quantization
    • Alphabet size
    • Alphabet values
  • Signaling
  • Integration with AV1
  • Results

In [1]:
%matplotlib inline

import numpy as np
from scipy.ndimage  import imread
import matplotlib.pyplot as plt
from scipy.spatial import Voronoi, voronoi_plot_2d

from numpy import genfromtxt
import os

from scipy.cluster.vq import kmeans2

class ListTable(list):
    """ Overridden list class which takes a 2-dimensional list of 
        the form [[1,2,3],[4,5,6]], and renders an HTML Table in 
        IPython Notebook. """
    
    def _repr_html_(self):
        html = ["<table>"]
        for row in self:
            html.append("<tr>")
            
            for col in row:
                html.append("<td>{0}</td>".format(col))
            
            html.append("</tr>")
        html.append("</table>")
        return ''.join(html)
    
def cartesian(arrays, out=None):
    """
    Generate a cartesian product of input arrays.

    Parameters
    ----------
    arrays : list of array-like
        1-D arrays to form the cartesian product of.
    out : ndarray
        Array to place the cartesian product in.

    Returns
    -------
    out : ndarray
        2-D array of shape (M, len(arrays)) containing cartesian products
        formed of input arrays.

    Examples
    --------
    >>> cartesian(([1, 2, 3], [4, 5], [6, 7]))
    array([[1, 4, 6],
           [1, 4, 7],
           [1, 5, 6],
           [1, 5, 7],
           [2, 4, 6],
           [2, 4, 7],
           [2, 5, 6],
           [2, 5, 7],
           [3, 4, 6],
           [3, 4, 7],
           [3, 5, 6],
           [3, 5, 7]])

    """

    arrays = [np.asarray(x) for x in arrays]
    dtype = arrays[0].dtype

    n = np.prod([x.size for x in arrays])
    if out is None:
        out = np.zeros([n, len(arrays)], dtype=dtype)

    m = np.int(n / arrays[0].size)
    out[:,0] = np.repeat(arrays[0], m)
    if arrays[1:]:
        cartesian(arrays[1:], out=out[0:m,1:])
        for j in range(1, arrays[0].size):
            out[j*m:(j+1)*m,1:] = out[0:m,1:]
    return out

In [2]:
from IPython.display import HTML
HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 $('#toggle').attr('value', 'Show code')  
 } else {
 $('div.input').show();
 $('#toggle').attr('value', 'Hide code') 
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" id="toggle" value="Show code"></form>''')


Out[2]:

Introduction

Chroma from Luma (CfL) is a coding tool that predicts information in the chromatic planes based on previously encoded information in the Luma plane. The underlying assumption is that a correlation exists between the Luma plane and its chromatic counterparts. This correlation can be seen in the following image, notice the resemblance between the chromatic planes (Cr and Cr) and the Luma plane.


In [3]:
def showPlanes(im):
    plt.figure(figsize=(20,15))
    plt.subplot(1,3,1)
    showImage(im[:,:,0], "Luma")
    plt.subplot(1,3,2)
    showImage(im[:,:,1], "Cb")
    plt.subplot(1,3,3)
    showImage(im[:,:,2], "Cr")


def showImage(im, title):
    plt.imshow(im, cmap = plt.get_cmap('gray'), vmin = 0, vmax = 255)
    plt.axis('off')
    plt.title(title)
    

im = imread("../../videos/lizard.jpg", mode="YCbCr")
showPlanes(im);


In this document, we present the proposed CfL approach for the AV1 Codec. This new version of CfL draws from the strengths of the CfL implementation of the Thor codec and the CfL implementation of the Daala codec.

In [1], Midtskogen proposes to compute $\alpha$ and $\beta$ using the predicted values of Luma and Chroma. These values are available in both the encoder and decoder. As such, they are not signaled in the bitstream. Additionally, Midtskogen uses a threshold of 64 on the mean squared error between the reconstructed Luma pixel and the predicted Luma pixel to decide whether or not CfL should be used. An important note about the work of Midtskogen is that it applies to intra frame coding and inter frame coding.

In [2], Egge and Valin propose a frequency domain version of CfL. the authors exploit the gain-shape coding nature of perceptual vector quantization (PVQ) to avoid having to perform model fitting to find $\alpha$ and $\beta$. As such, only a sign bit needs to be signaled. It is important to note that this version of CfL only predicts ACs coefficients.

The proposed solution is similar to [1] in the it is performed in the spatial domain and least squares is used. However, it differs from [1] in that $\alpha$ is signaled in the bitstream. As for $\beta$, Midtskogen computes it using the least squares equation, we propose to use DC_PRED as the value of $\beta$. This aligns with [2], in that CfL does not predict the DC.

The characteristics of the different CfL implementations are described in the following table.

Thor[1] Daala[2] Proposed
Prediction Domain Spatial Frequency Spatial
Bitstream Signaling None Sign bit Alpha
Encoder model fitting Yes Via PVQ Yes
Decoder model fitting Yes No No
Luma values used for fit Predicted Luma Luma
Chroma values used for fit Predicted Chroma Chroma

CfL Fundamentals

Although it does not always hold, the assumption of a linear relationship between the luma plane and its chromatic counterparts is not far-fetched. For example, the following image shows the relationship between the Luma and Chroma for an $8 \times 8$ block in the previous lizard image.


In [4]:
plt.figure(figsize=(10,5))
plt.subplot(1,2,1)
plt.scatter(im[300:308,500:508,0].ravel(), im[300:308,500:508,1].ravel(), c="blue", alpha=0.5, s=25)
plt.xlabel("Luma")
plt.ylabel("Cb")
plt.gca().set_ylim([95, 135])
plt.subplot(1,2,2)
plt.scatter(im[300:308,500:508,0].ravel(), im[300:308,500:508,2].ravel(), c="red", alpha=0.5, s=25)
plt.xlabel("Luma")
plt.ylabel("Cr")
plt.gca().set_ylim([95, 135]);


Assuming that this relationship also exists between the $i$th reconstructed Luma pixel $\hat{L}_i$ and $i$th Chroma pixel $C_i$, a chromatic prediction of $C_i$ can be computed using the following equation

$$C'_i = \alpha \times \hat{L}_i + \beta,$$

where $\alpha$ is the slope and $\beta$ is y intercept. For a block of $N$ pixels, $\alpha$ and $\beta$ are computed with the least squares equations as follows

$$\alpha = \frac{N \sum_i \hat{L}_i C_i - \sum_i \hat{L}_i \sum_i C_i}{N \sum_i L_i^2 - (\sum \hat{L}_i)^2},$$$$\beta = \frac{\sum_i C_i - \alpha \sum_i \hat{L}_i}{N}.$$

As explained in [1], signaling $\alpha$ and $\beta$ on a per block basis is too expensive. To avoid signaling these parameters, Midtskogen proposes instead to compute $\alpha'$ and $\beta'$ by replacing the actual values $\hat{L}$ and $C$ by their predicted counterparts $L'$ and $C'$.

The advantage of such an approach is that $L'$ and $C'$ are available to both the encoder and decoder, the model fitting is performed during encoding and during decoding, thus signaling alpha and beta is avoided. The disadvantage of this approach is that the model will fit the prediction but not the actual data. In the context of intra coding, the prediction might significantly differ from the actual values. This loss of precision will increase distortion and rate.

Fitting the model on the real values of $\hat{L}$ and $C$ generates a more precise model. However, this information is not available to the decoder and must be signaled in the bitstream. In this proposal, we will present different techniques to reduce the rate required to signal the model fitting parameters.

Let $\bar{L}$ be the subtraction of $\hat{L}$ by its average for a given block. Notice that $\bar{L}$ is zero mean, it follows that

$$\sum_i \bar{L_i} = 0.$$

Notice that when we replace $\hat{L}$ with $\bar{L}$ in the equation of $\beta$, we get the average chromatic pixel value. It just so happens that the intra prediction mode DC_PRED is designed to predict the average pixel value.

Next, let $\bar{C'}$ be the subtraction of $C$ by the predicted average using DC_PRED.

Replacing $\hat{L}$ by $\bar{L}$ and $C$ with $\bar{C'}$ in the equations of $\alpha$ and $\beta$ yields

$$\bar{\alpha} = \frac{\sum_i \bar{L}_i \bar{C'}_i}{\sum_i \bar{L}_i^2},$$$$\bar{\beta} = \frac{\sum_i \bar{C'}_i}{N}.$$

Notice that $\bar{\beta}$ is the average of the residual of the chromatic block.

Based on the assumption that $\bar{\beta} \approx 0$, we predict chroma as follows

$$ C''_i = \bar{\alpha} \times \bar{L_i} + DC\_PRED $$

_Note that it could also be possible to signal $\sum_i \bar{L_i} \bar{C'_i}$ instead of $\alpha$. However, signaling $\alpha$ give better results._


In [201]:
block_size = 8

def psnr(im1, im2):
    h, w = im1.shape
    sse = np.sum((im1 - im2)**2)
    return 20 * np.log10(255) - 10 * np.log10(sse/(h*w))

def cfl(im, block_size, plane):
    height, width, z = im.shape
    cfl = np.array(np.zeros((height, width)), dtype='uint8')
    
    for y in range(0, height, block_size):
        for x in range(0, width, block_size):
            bY = im[y:y+block_size, x:x+block_size, 0]
            bC = im[y:y+block_size, x:x+block_size, plane]
            
            L = bY - np.mean(bY)
            C = bC - np.mean(bC)
            
            sLL = np.sum(L*L)
            sLC = np.sum(L*C)
            
            if sLL != 0:
                a = sLC / sLL
            else:
                a = 0
            cfl[y:y+block_size, x:x+block_size] = np.round(a * L + np.mean(bC))
    return cfl

def cfl_prop(im, block_size, plane):
    height, width, z = im.shape
    cfl = np.array(np.zeros((height, width)), dtype='uint8')
    alphas = []
    
    for y in range(0, height, block_size):
        for x in range(0, width, block_size):
            bY = im[y:y+block_size, x:x+block_size, 0]
            bC = im[y:y+block_size, x:x+block_size, plane]
            
            above_row = im[max(y-1,0), x:x+block_size, plane]
            left_col = im[y:y+block_size, max(x-1,0), plane]
            DC_PRED = np.mean([above_row, left_col])
            
            L = bY - np.mean(bY)
            C = bC - DC_PRED
            
            sLL = np.sum(L*L)
            sLC = np.sum(L*C)
            
            if sLL != 0:
                alpha = sLC / sLL
            else:
                alpha = 0
            alphas.append(alpha)
            cfl[y:y+block_size, x:x+block_size] = np.round(alpha * L + DC_PRED)
            
    return cfl, alphas

cfl_prob_cb, alphas_cr = cfl_prop(im, block_size, 1)
cfl_prob_cr, alphas_cb = cfl_prop(im, block_size, 2)

cfl_cb = cfl(im, 8, 1)
cfl_cr = cfl(im, 8, 2)
    
plt.figure(figsize=(15,15))
plt.subplot(3,2,1)
showImage(im[:,:,1], "Cb")
plt.subplot(3,2,2)
showImage(im[:,:,2], "Cr")
plt.subplot(3,2,3)
showImage(cfl_cb, "CfL_Cb (PSNR: %0.2f dB)\nAlpha and Beta computed" % psnr(im[:,:,1], cfl_cb))
plt.subplot(3,2,4)
showImage(cfl_cr, "CfL_Cr (PSNR: %0.2f dB)\nAlpha and Beta computed" % psnr(im[:,:,2], cfl_cr))
plt.subplot(3,2,5)
showImage(cfl_prob_cb, "Proposed CfL_Cb (PSNR: %0.2f dB)\nAlpha computed, Beta predicted" % psnr(im[:,:,1], cfl_prob_cb))
plt.subplot(3,2,6)
showImage(cfl_prob_cr, "Proposed CfL_Cb (PSNR: %0.2f dB)\nAlpha computed, Beta predicted" % psnr(im[:,:,2], cfl_prob_cr))


For a more general overview of the precision of the proposed approach, the following table shows the quality loss of the proposed CfL scheme on the images of the Kodak Lossless True Color Image Suite


In [204]:
img_folder = "../../videos/kodim/"
table = ListTable()
table.append(['File name', 'PSNR CfL Cb', 'PSNR CfL Proposed Cb', 'PSNR CfL Cb', 'PSNR CfL Proposed Cr'])
i = 0
psnr_cfl_cb = np.zeros((24,1))
psnr_cfl_prop_cb = np.zeros((24,1))
psnr_cfl_cr = np.zeros((24,1))
psnr_cfl_prop_cr = np.zeros((24,1))
alphas_cb = np.array([])
alphas_cr = np.array([])
for file in sorted(os.listdir(img_folder)):
    if file.endswith(".png"):
        kodim = imread(os.path.join(img_folder, file), mode="YCbCr")
        cfl_cb, a = cfl_prop(kodim, block_size, 1)
        alphas_cb = np.concatenate((alphas_cb, a), axis=0)
        cfl_cr, a = cfl_prop(kodim, block_size, 2)
        alphas_cr = np.concatenate((alphas_cr, a), axis=0)
        psnr_cfl_cb[i] = round(psnr(kodim[:, :, 1], cfl(kodim, block_size, 1)),2)
        psnr_cfl_prop_cb[i] = round(psnr(kodim[:, :, 1], cfl_cb),2)
        psnr_cfl_cr[i] = round(psnr(kodim[:, :, 2], cfl(kodim, block_size, 2)),2)
        psnr_cfl_prop_cr[i] = round(psnr(kodim[:, :, 2], cfl_cr),2)
        table.append([file, str(psnr_cfl_cb[i]), str(psnr_cfl_prop_cb[i]), str(psnr_cfl_cr[i]), str(psnr_cfl_prop_cr[i])])
    i = i + 1
table.append(['Average', round(np.mean(psnr_cfl_cb),2), round(np.mean(psnr_cfl_prop_cb),2), round(np.mean(psnr_cfl_cr),2), round(np.mean(psnr_cfl_prop_cr),2)])
table


Out[204]:
File namePSNR CfL CbPSNR CfL Proposed CbPSNR CfL CbPSNR CfL Proposed Cr
kodim01.png[ 44.37][ 42.45][ 41.77][ 38.62]
kodim02.png[ 41.26][ 39.83][ 37.36][ 36.06]
kodim03.png[ 43.38][ 40.63][ 44.06][ 42.05]
kodim04.png[ 46.3][ 43.07][ 38.93][ 36.3]
kodim05.png[ 38.65][ 36.42][ 38.68][ 36.7]
kodim06.png[ 44.][ 40.94][ 43.83][ 41.78]
kodim07.png[ 40.88][ 38.33][ 42.82][ 40.23]
kodim08.png[ 41.29][ 38.05][ 39.05][ 36.76]
kodim09.png[ 43.99][ 41.2][ 44.88][ 42.5]
kodim10.png[ 44.69][ 41.3][ 44.53][ 42.02]
kodim11.png[ 43.82][ 41.44][ 40.63][ 38.49]
kodim12.png[ 45.58][ 43.11][ 44.66][ 41.31]
kodim13.png[ 40.12][ 38.16][ 43.47][ 41.37]
kodim14.png[ 38.88][ 36.63][ 38.73][ 36.99]
kodim15.png[ 44.04][ 41.27][ 39.59][ 37.52]
kodim16.png[ 46.13][ 42.65][ 48.][ 45.56]
kodim17.png[ 44.91][ 42.86][ 46.9][ 44.41]
kodim18.png[ 40.37][ 37.61][ 39.94][ 37.78]
kodim19.png[ 44.78][ 41.86][ 44.58][ 42.12]
kodim20.png[ 43.16][ 40.1][ 45.67][ 42.84]
kodim21.png[ 43.05][ 40.09][ 44.39][ 41.41]
kodim22.png[ 40.63][ 38.34][ 40.81][ 38.63]
kodim23.png[ 42.04][ 38.92][ 41.46][ 38.61]
kodim24.png[ 39.54][ 37.75][ 41.51][ 39.67]
Average42.7440.1342.3439.99

The PSNR reduction indicates the cost associated of assuming that $\bar{\beta} \approx 0$.

Quantization

As we already mentioned, signaling one $\bar{\alpha}$ for each chromatic plane for every block in a frame is quite expensive. As we will show, quantization is very effective, because of the difference of scale between $\bar{\alpha}$ and the pixel themselves.

The following histogram demonstrates the values of $\bar{\alpha}$ measured while performing the proposed CfL algorithm over the 24 images of the Kodak Lossless True Color Image Suite.


In [205]:
plt.figure(figsize=(15,5))
n, bins, patches = plt.hist([alphas_cb, alphas_cr], 50, normed=1, alpha=0.75, label=["Cb", "Cr"])
plt.legend(title="Alpha");


Based on this histogram, we can safely assume that $\bar{\alpha}$ is most likely between -1.5 and 1.5. Interestingly, this assumption generalizes to AV1. The following histogram shows the values of $\bar{\alpha}$ recorded during the execution of the AV1 encoder for QPs 20, 32, 43, 55, 63 for all the images contained in the subset-1. We removed outliers by discarding the values smaller than the first percentile and values greater than the last percentile.


In [42]:
def remove_outliers(alphas):
    q99, q1 = np.percentile(alphas, [99 ,1])
    return np.delete(alphas, np.where(np.logical_or(alphas < q1, alphas > q99)))

def load_alphas(qp):
    alphas = []
    for file in os.listdir(str(qp)):
        if file.endswith(".csv"):
            alphas = np.concatenate( (alphas, genfromtxt(os.path.join(str(qp), file), delimiter=',')), axis=0)
    return alphas


alphas_20 = remove_outliers(load_alphas(20))
alphas_32 = remove_outliers(load_alphas(32))
alphas_43 = remove_outliers(load_alphas(43))
alphas_55 = remove_outliers(load_alphas(55))
alphas_63 = remove_outliers(load_alphas(63))

In [43]:
plt.figure(figsize=(15,10))
n, bins, patches = plt.hist([alphas_20, alphas_32, alphas_43, alphas_55, alphas_63], 50, normed=1, alpha=0.75, label=['20', '32', '43', '55', '63'])
plt.legend(title="QP");


Now that we know the range of $\bar{\alpha},$ let's consider the impact of quantization. As the histogram suggests, a fixed quantization step size is not desired. To obtain a non-uniform quantization table, we resort to the k-means clustering (Lloyd's algorithm).

In order to evaluate the impact of the alphabet size used for $\bar{\alpha},$ we compute all the quantization tables with alphabets of sizes 3 to 16. These alphabets are then used for CfL and the resulting PSNR is measured over the entire Kodak Lossless True Color Image Suite. We make the assumption here that this generalizes to AV1.


In [9]:
def cfl_prop_q(im, block_size, plane, codes):
    height, width, z = im.shape
    cfl = np.array(np.zeros((height, width)), dtype='uint8')
    alphas = []
    
    num_codes = len(codes)
    for c in range(0, num_codes-1):
        table[c] = (codes[c] + codes[c+1]) / 2
    
    for y in range(0, height, block_size):
        for x in range(0, width, block_size):
            bY = im[y:y+block_size, x:x+block_size, 0]
            bC = im[y:y+block_size, x:x+block_size, plane]
            
            above_row = im[max(y-1,0), x:x+block_size, plane]
            left_col = im[y:y+block_size, max(x-1,0), plane]
            DC_PRED = np.mean([above_row, left_col])
            
            L = bY - np.mean(bY)
            C = bC - DC_PRED
            
            sLL = np.sum(L*L)
            sLC = np.sum(L*C)
            
            if sLL != 0:
                alpha = sLC / sLL
            else:
                alpha = 0
            alphas.append(alpha)
            
            i = 0
            while i < num_codes-1 and alpha > table[i]:
                i = i + 1
            alpha_q = codes[i]
            
            cfl[y:y+block_size, x:x+block_size] = np.round(alpha_q * L + DC_PRED)
            
    return cfl, alphas

In [27]:
k_psnrs_cb = np.zeros((17,24));
psnr_cb = np.zeros((24,1));
k_psnrs_cr = np.zeros((17,24));
psnr_cr = np.zeros((24,1));
f = 0
for file in sorted(os.listdir(img_folder)):
    if file.endswith(".png"):
        kodim = imread(os.path.join(img_folder, file), mode="YCbCr")
        for k in range(3,17):
            codes_cb, label = kmeans2(alphas_cb, k)
            codes_cr, label = kmeans2(alphas_cr, k)
            cfl_cb, alpha = cfl_prop_q(kodim, block_size, 1, np.sort(codes_cb))
            cfl_cr, alpha = cfl_prop_q(kodim, block_size, 2, np.sort(codes_cr))
            k_psnrs_cb[k, f] = psnr(kodim[:,:,1], cfl_cb)
            k_psnrs_cr[k, f] = psnr(kodim[:,:,2], cfl_cr)
            
        cfl_cb, alphas = cfl_prop(kodim, block_size, 1)
        cfl_cr, alphas = cfl_prop(kodim, block_size, 2)
        psnr_cb[f] = psnr(kodim[:,:,1], cfl_cb)
        psnr_cr[f] = psnr(kodim[:,:,2], cfl_cr)
        f = f + 1;

In [166]:
plt.figure(figsize=(14,14))
plt.subplot(2,1,1);
plt.plot(np.arange(0,17), np.ones((17,1)) * np.mean(psnr_cb), 'g', label='CfL')
plt.plot(np.arange(0,17), np.mean(k_psnrs_cb, axis=1), 'b', label='Quantized CfL')
plt.gca().set_xlim([3, 16])
plt.gca().set_ylim([39, 41])
plt.xlabel('Alpha alphabet size')
plt.ylabel('PSNR-Cb dB');
plt.title('Average Cb visual quality of quantization\nfor the Kodak Lossless True Color Image Suite')
plt.legend()

plt.subplot(2,1,2);
plt.plot(np.arange(0,17), np.ones((17,1)) * np.mean(psnr_cr), 'g', label='CfL')
plt.plot(np.arange(0,17), np.mean(k_psnrs_cr, axis=1), 'r', label='Quantized CfL')
plt.gca().set_xlim([3, 16])
plt.gca().set_ylim([39, 41])
plt.title('Average Cr visual quality of quantization\nfor the Kodak Lossless True Color Image Suite')
plt.xlabel('Alpha alphabet size')
plt.ylabel('PSNR-Cr (dB)');
plt.legend();


It is important to notice that using only an alphabet of 3 $\bar{\alpha}\text{s}$, the cost of quantization is less than 1.0 dB. Furthermore, diminishing returns are observed past an alphabet of 8 $\bar{\alpha}\text{s}$. As we will explain in the next section, an alphabet of 8 $\alpha$s is particularly convenient, as once the sign bit is removed, both $\bar{\alpha}\text{s}$ can fit in one 4 bit symbol used by AV1's entropy coder.

TODO

  • Show chosen tables and codes based on AV1 Data

2D Quantization

A one-dimensional quantization scheme would use 2 independent alphabets for $\bar{\alpha}$ Cb and for $\bar{\alpha}$ Cr. For example, based on the $\bar{\alpha}\text{s}$ obtained by applying CfL over the Kodak Lossless True Color Image Suite, we get two alphabets of size 4 shown in the following Voronoi cells.


In [306]:
from matplotlib import colors, ticker, cm

plt.figure(figsize=(10,8))
plt.gca().set_xlim([-4, 4])
plt.gca().set_ylim([-4, 4])
plt.hexbin(alphas_cb, alphas_cr, bins='log', cmap='Blues', extent=[-4, 4, -4, 4])
plt.xlabel("Alpha Cb")
plt.ylabel("Alpha Cr")


plt.title("Log scale heatmap of the alpha values for the Kodak True Color Image Suite");
cb = plt.colorbar();
cb.set_label('log10(N)')
plt.show()



In [309]:
centers_cb, label = kmeans2(alphas_cb, 4)
centers_cr, label = kmeans2(alphas_cr, 4)

vor1D = Voronoi(cartesian((np.sort(centers_cb), np.sort(centers_cr))))

plt.figure(figsize=(10,8))

plt.hexbin(alphas_cb, alphas_cr, bins='log', cmap='Blues', extent=[-4, 4, -4, 4])
plt.xlabel("Alpha Cb")
plt.ylabel("Alpha Cr")
plt.title(("1D Quantization of Alpha Cb and Alpha Cr\n 16 Codes (4 codes for Cb x 4 codes for Cr)"));
cb = plt.colorbar();
cb.set_label('log10(N)')

voronoi_plot_2d(vor1D, ax=plt.gca(), show_points=False, show_vertices=False)
plt.scatter(centers[:,0], centers[:,1], c='black', marker='.', s=45);

plt.gca().set_xlim([-4, 4])
plt.gca().set_ylim([-4, 4])


Out[309]:
(-4, 4)

In this section, we investigate the impact on chromatic visual quality of using one code to represent both chromatic $\bar{\alpha}\text{s}$, hence two-dimensional quantization.

The previous two alphabets of size 4 results in a combined alphabet of size 16. Instead, we use an alphabet of 16 to represent different combinations of $\bar{\alpha}$ for Cb and Cr. A combined alphabet will be more precise if both $\bar{\alpha}\text{s}$ are not independent. To determine this, let's consider a two-dimensional plane where the x-axis is the value of $\bar{\alpha}$ for the Cb plane and the y-axis is the $\bar{\alpha}$ of the Cr plane. The following figure shows such a plane for the previously recorded values of $\bar{\alpha}$ over the entire Kodak True Color Image Suite.

Independent $\bar{\alpha}\text{s}$ would be uniformly distributed over the plane. The previous image clearly shows a dependency between both $\bar{\alpha}\text{s}$ as the values increase.

Using an alphabet of size 16 to represent a combination of a value of $\bar{\alpha}$ for Cb and a value of $\bar{\alpha}$ for Cr, we split the plane into the following Voronoi cells.


In [312]:
a = np.column_stack((alphas_cb, alphas_cr))

centers, label = kmeans2(a, 16)
vor2D = Voronoi(centers)

plt.figure(figsize=(10,8))

plt.hexbin(alphas_cb, alphas_cr, bins='log', cmap='Blues', extent=[-4, 4, -4, 4])
plt.xlabel("Alpha Cb")
plt.ylabel("Alpha Cr")


plt.title("Log scale heatmap of the alpha values for the Kodak True Color Image Suite")
cb = plt.colorbar();
cb.set_label('log10(N)')

voronoi_plot_2d(vor2D, ax=plt.gca(), show_points=False, show_vertices=False)
plt.gca().set_xlim([xedges[0], xedges[-1]])
plt.gca().set_ylim([yedges[0], yedges[-1]])
plt.scatter(centers[:,0], centers[:,1], c='black', marker='.', s=45);

plt.gca().set_xlim([-4, 4])
plt.gca().set_ylim([-4, 4])


Out[312]:
(-4, 4)

In the following plot, we compare the average chromatic visual quality with respect to the combined alphabet size of both quantization scheme over the Kodak Lossless True Color Image Suite.


In [209]:
def cfl_prop_q2(im, block_size, codes):
    height, width, z = im.shape
    cfl_cb = np.array(np.zeros((height, width)), dtype='uint8')
    cfl_cr = np.array(np.zeros((height, width)), dtype='uint8')
    indices = []
    
    num_codes, z = codes.shape
    
    for y in range(0, height, block_size):
        for x in range(0, width, block_size):
            bY = im[y:y+block_size, x:x+block_size, 0]
            bCb = im[y:y+block_size, x:x+block_size, 1]
            bCr = im[y:y+block_size, x:x+block_size, 2]
            
            above_row_cb = im[max(y-1,0), x:x+block_size, 1]
            left_col_cb = im[y:y+block_size, max(x-1,0), 1]
            dc_pred_cb = np.mean([above_row_cb, left_col_cb])
            
            above_row_cr = im[max(y-1,0), x:x+block_size, 2]
            left_col_cr = im[y:y+block_size, max(x-1,0), 2]
            dc_pred_cr = np.mean([above_row_cr, left_col_cr])
            
            L = bY - np.mean(bY)
            Cb = bCb - dc_pred_cb
            Cr = bCr - dc_pred_cr
            
            sLL = np.sum(L*L)
            sLCb = np.sum(L*Cb)
            sLCr = np.sum(L*Cr)
            
            if sLL != 0:
                alpha_cb = sLCb / sLL
                alpha_cr = sLCr / sLL
            else:
                alpha_cb = 0
                alpha_cr = 0
            
            # Euclidian distance, we don't need to do sqrt because we want min
            smallest = (codes[0, 0] - alpha_cb)**2 + (codes[0, 1] - alpha_cr)**2
            ind = 0
            for c in range(1, num_codes):
                d = (codes[c,0] - alpha_cb)**2 + (codes[c,1] - alpha_cr)**2
                if d < smallest:
                    smallest = d
                    ind = c
            indices.append(ind)
            
            alpha_cb_q = codes[ind, 0]
            alpha_cr_q = codes[ind, 1]
            
            cfl_cb[y:y+block_size, x:x+block_size] = np.round(alpha_cb_q * L + dc_pred_cb)
            cfl_cr[y:y+block_size, x:x+block_size] = np.round(alpha_cr_q * L + dc_pred_cr)
            
    return cfl_cb, cfl_cr, indices

In [210]:
k_psnrs_cb_q2 = np.zeros((17,24));
psnr_cb = np.zeros((24,1));
k_psnrs_cr_q2 = np.zeros((17,24));
psnr_cr = np.zeros((24,1));
f = 0

k_range = np.arange(0,17)
k2_range = k_range**2
a = np.column_stack((alphas_cb, alphas_cr))

for file in sorted(os.listdir(img_folder)):
    if file.endswith(".png"):
        kodim = imread(os.path.join(img_folder, file), mode="YCbCr")
        for k in np.arange(3,17):
            codes_2d, labels = kmeans2(a, k**2, check_finite=False)
            _cfl_cb_q2, _cfl_cr_q2, ind = cfl_prop_q2(kodim, block_size, codes_2d)
            k_psnrs_cb_q2[k, f] = psnr(kodim[:,:,1], _cfl_cb_q2)
            k_psnrs_cr_q2[k, f] = psnr(kodim[:,:,2], _cfl_cr_q2)
            
        _cfl_cb, alphas = cfl_prop(kodim, block_size, 1)
        _cfl_cr, alphas = cfl_prop(kodim, block_size, 2)
        psnr_cb[f] = psnr(kodim[:,:,1], _cfl_cb)
        psnr_cr[f] = psnr(kodim[:,:,2], _cfl_cr)
        f = f + 1;


/usr/local/lib/python3.5/dist-packages/scipy/cluster/vq.py:653: UserWarning: One of the clusters is empty. Re-run kmean with a different initialization.
  warnings.warn("One of the clusters is empty. "

In [216]:
mean_k_psnrs_q2 = np.mean(np.column_stack((np.mean(k_psnrs_cb_q2, axis=1), np.mean(k_psnrs_cr_q2, axis=1))), axis=1)
mean_k_psnrs = np.mean(np.column_stack((np.mean(k_psnrs_cb, axis=1), np.mean(k_psnrs_cr, axis=1))), axis=1)
mean_psnrs = np.ones((17,1)) * np.mean((psnr_cb, psnr_cr)) 

plt.figure(figsize=(15,5))
plt.plot(k2_range, mean_psnrs, 'g', label='CfL')
plt.plot(k2_range, mean_k_psnrs, 'b', marker='.', label='Quantized CfL')
plt.plot(k2_range, mean_k_psnrs_q2, 'b--', marker='.', label='2D Quantized CfL')

plt.gca().set_xticks(k2_range)
plt.gca().set_xlim([3**2, 17**2])
plt.gca().set_ylim([39, 41])

plt.xlabel('Total alpha alphabet size (Cb Alphabet + Cr Alphabet)')
plt.ylabel('Average PSNR for both chromatic planes (dB)');
plt.title('Average chromatic visual quality of 1D and 2D quantization\nfor the Kodak Lossless True Color Image Suite')
plt.legend();



In [215]:
k_range = np.arange(0,17)
k2_range = k_range**2

From the previous plot, we see that quantizing the combination of both $\bar{\alpha}\text{s}$ results in better chromatic visual quality than quantizing them separately.

Signaling

The proposed signaling scheme is based on the previous findings that the benefits of an alphabet size greater than 8 $\bar{\alpha}\text{s}$ are relatively small and that the entropy coding tools in AV1 support cumulative distribution functions of up to 16 symbols. We propose to signal $\bar{\alpha}$ at the block level and combine the $\bar{\alpha}$ for Cb and the $\bar{\alpha}$ for Cr into the same symbol.

Combining $\bar{\alpha}$ Cb and $\bar{\alpha}$ Cr in the same symbol exploits the fact that some combinations of $\bar{\alpha}$ Cb and $\bar{\alpha}$ Cr are highly improbable. In order to increase the $\bar{\alpha}$ alphabet without using another symbol, we code a seperate sign bit for $\bar{\alpha}$ Cb and $\bar{\alpha}$ Cr. By adding "0" to the alphabet, we can avoid signaling the sign bit when "0" is chosen. However, doing so reduces the number of codes from 8 to 7.

TODO

  • Show signaling at the block level vs signaling at the transform level
  • Show the joint probabilities of alpha Cb and alpha Cr
  • Make probability table
  • Add skip special case
  • Is it worth it to use seperate codes and tables for Cb and Cr?

Results

Current best results for Subset1:

PSNR PSNR Cb PSNR Cr PSNR HVS SSIM MS SSIM CIEDE 2000
1.4037 -13.0741 -11.6908 1.5487 1.4272 1.3995 -3.4491

https://arewecompressedyet.com/?job=master%402017-03-27T18%3A41%3A56.236Z&job=CfL%402017-04-05T14%3A23%3A57.535Z

References

[1] Steinar Midtskogen,"Improved chroma prediction" IETF draft-midtskogen-netvc-chromapred-02 https://tools.ietf.org/html/draft-midtskogen-netvc-chromapred-02 (October 2016)

[2] Nathan E. Egge and Jean-Marc Valin, "Predicting Chroma from Luma with Frequency Domain Intra Prediction", https://people.xiph.org/~unlord/spie_cfl.pdf (April 2016)